REPORT 



resumes 



ED Oil IH AL odd 106. 

PARSING BY MATRIX--A DEVELOPMENT IN SYNTACTIC ANALYSIS OF 
RUSSIAN, research IN MACHINE TRANSLATION. 

BY- STEIGER, AMELIA JANIOTIS 
WAYNE STATE UNIV., DETROIT, MICH. 

PUB BATE 31 AUG 66 

EBRS PRICE MF-$D.18 HC-$2.76 69P. 

DESCRIPTORS- PRUSSIAN, ^SYNTAX, ❖MACHINE TRANSLATION, 
COMPUTATIONAL LINGUISTICS, COMPUTER PROGRAMS, MACHINE 
TRANSLATION, MT PROJECT, DETROIT, FULCRUM TECHNIQUE, PARSE, 
HYPERPARSE 

RESEARCH IN SYNTACTIC ANALYSIS OF RUSSIAN, WHICH WAS 
DEVELOPED IN A PROGRAM FOR COMPUTER-AIDED RUSSIAN-ENGLISH 
TRANSLATION, IS DESCRIBED. THE CORPUS CONSISTED OF 15 RUSSIAN 
MATHEMATICAL ARTICLES. THE THEORY USED IS THE "FULCRUM- 
APPROACH OF ^BUNKER-RAMO, BUT THE COMPUTER IMPLEMENTATION HAS 
DEVELOPED ALONG DISTINCT LINES. THREE TYPES OF SYNTACTIC 
ROUTINES ARE DESCRIBED IN THE. ORDER OF THEIR 
application — BLOCKING ROUTINES, PROFILING, AND PARSING 
(PARSE, HYPERPARSE). ALTHOUGH THE IMPROVED PARSING ROUTINE, 
HYPERPARSE, AND THE AUXILIARY DICTIONARY USED WITH IT ARE A 
FIRST APPROXIMATION TO SATISFACTORY LANGUAGE TRANSFER, 
ADDITIONAL CODING IS NEEDED FOR IMPROVED QUALITY OF 
TRANSLATION. (KL) 









Research in Mo'chine Translation 





parsing by MATRIX: 

A Devtiopment in Syntactic Analysis of Russian 



by 




U.5. DEPARTMENT OF HEALTH, EDUCATION & WELFARE 
OFFICE OF EDUCATION 



THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE 
.PERSON OR ORGANIZATION ORIGINATING IT. POINTS OF VIEW OR OPINIONS, 
STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION 
POSITION OR POLICY. 



Wdyne State University QOU- 106 




Detroit, Michigan 



fkl 000 106 



Research in Machine Translation 



PARSING BY MATRIX: 

A Development in Syntactic Analysis of Russian 



by 

Amelia Janiotis Steiger 
Wayne State University 



August 31 , 1 966 



DISTRIBUTION OF DOCUMENT UNLIMITED 

This study was supported and sponsored by: 

Department of the Navy 
Office of Naval Research 
Research and Studies Program 
Project, No. N0NR-2562(00) 



4 CKNOWLEDGEMENT 



The author is indebted to Wayne T. Watson^ formerly 
of the programming staff of- the Wayne State University 
machine translation proQect^ for his assistance^ en- 
couragement^ and interest in the development of this 
paper. 



r 









I TABLE OF CONTENTS 

{ 

j Page 

I Preface : . . . 

* 1 

I 

i INTRODUCTORY SUMMARY OF RESEARCH IN 

i MACHINE TRANSLATION CONDUCTED AT 

; WAYNE STATE UNIVERSITY 1 

i PARSING BY MATRIX 



PART I - The Blocking Routines 6 

PART II - Profile....; 17 

PART III - Parse 21 

PART IV - Predicative Government 27 

PARt V - Hyperparse ’ 41 

DISTRIBUTION LIST 53 



) 

. i 

* 



4 



i 






i 



PREFACE' 



There are almost as many approaches to syntactic analysis 
as there are groups .worki ng in the machine translation field. 

Of these groups , t'he following have worked on analysis of 
Russian syntax: Texas, Berkeley, Harvard,- RAND, Georgetown, IBM, 

Bunker-Ramo, and Wayne State University. The approaches of 
these groups are summarized in the following paragraphs. 

Despite the differences in theory and method which exist 
among the groups, they all have the goal of analyzing the 
structure of the Russian sentence in order to effect a transla- 
tional transformation into English. Once the syntactic analysis 
is accomplished for Russian, semantic features can be brought 
into the program for the translation stage. 

The technical staff of the Linguistic Research Center at 
Texas consists of three main groups. One of these, viz. 

Descriptive Linguistics, functions mainly to provide a des- 
cription of the structure of each language to be used in the 
translation system. As L.W. Tosh has indicated, "Our approach 
to describing language structure is a strati fi cati onal one."^ 

He goes on to point out that this approach comprises three levels 
of analysis - lexical, syntactic, and semantic - and that the level 
of greatest interest to the linguist Is the syntactic, on which 
those structural elements which account for number, tense, agree- 
ment, and word order are analyzed. In discussing research pro- 
cedures, he mentions that research on language structures is 
text-oriented, and although they use general features of Russian 
structure, they proceed from textual occurrences. They select 
text that has been translated, using first the Russian alone 
for monolingual analysis, and then proceeding to the translation 
in order to construct a transfer grammar. 

Sydney M. Lamb in his syntactic research at Berkeley, has 
been primarily concerned with developing " a system for tactic 
analysis in general", i.e„ a description of arrangements. He 
states that "... the term syntax is traditionally used with 
reference to arrangements on the morphemic stratum." He specifies 



-li- 



the form of the syntactic description as follows: "The syntax 
may be completely- described by a list of distribution classes 
of items with the membership of each, and a list of constructions. 
A construction is characterized by specification of (1) the 
distribution classes which enter into it and their relative 
order, (2) the distribution-class membership of the constitutes."^ 
No’computer implementation of the method has been announced. 

The Harvard group uses the Predictive Syntactic Analysis 
Technique. Murray Sherry has written: "The method of predic- 
ti^ve syntactic analysis is based on the premise that a Russian 
sentence can be scanned from left to. right, and that at any 
point in this process it is possible both to determine the 
syn-tac'ti e* structure of the word under consideration on the 
basis of the predictions made during the analysis of the words 
to its left, and to predict the syntactic structures which will 
be encountered to the right of the current word." He points 
out further: "Predictions of syntactic structures are stored in 
a prediction pool which behaves somewhat like a pushdown store, 
a linear array of storage elements in which information is 
entered or removed from one end only, in accordance with a 
'last-in-first-out' principle. N,ew predictions are always 
entered at the top of the prediction pool, and the predictions 
are nested starting at the top of the pool and proceeding down- 
ward. ' The topmost prediction in a pool need not necessarily 
be the next prediction to be fulfilled." It can be seen that 
at an intermediate point in the analysis of a sentence, the 
pool contains a set of predictions which are generated by the 
processing of the preceding words and which are to be fulfilled 
by the remaining words. 

Another well known approach to syntactic analysis is employed 
by the RAND group and is based on dependency theory as elaborated 
by David G. Hays. This method, which Hays calls "sentence 
structure determination", seeks to establish dependency rela- 
tionships between text occurrences in the sentence. The analysis 
shows the connections among words in a sentence, where certain 



words are said to have other words dependent ‘on them. There are 
five areas of dependency: subjective, complementary, adjectival, 
modal, and modif i cati onal . Hays has stated: ."Dependency theory’ 
IS actually a characterization theory, not necessarily associable 
with any empirical method or principle. It is a theory of 
grammars, with abstract mechanisms for characterizing sets of 
utterances and for assigning to them certain structural descrip- 
tions, which will 'be called D-trees."^ 

One of the oldest MT research groups, the Georgetown group 
uses an approach which is called "General Analysis Technique". 
This method, as Micha.el Zarechnak has written, seeks to perform 
the translation operation "... in terms of a machine-programmable 
analysis and tpnsfer of successively included constituents in 
the sentence." Their strategy is to perform three levels of 
analysis on the sentence. On the first level (morphemic), the 
individual word is analyzed; on the second level ( syntagmat i c ) , 
blocks of adjacent words related in certain ways are constructed; 
on the third level ( syn tacti c ), the subject(s) and predicate(s) 
of the sentence are located and analyzed. The levels are not 
self-contained or independent stages; they are segments of the 
entire technique. A detailed description of the above procedure 
has been provided by R.R. Macdonald.^ 

The IBM group presencly utilizes a sentence-structure-deter- 
mination routine which "... attempts to parse source-language 
sentences: to recognize their various constituents and assign 
them their^position within the tree-like structure of the 
sentence." It is this routine which epitomizes- machine trans- 
lation research, and is the only linguistic area of MT where 
there is accommodation to hardware. With respect to their 
multipass translation system, it has been stated that the "... 

... search routine with its pass structure attempts to make 
provision for the recognition of, on the one hand, the consti- 
tuent structure of sentences, and, on the other hand, the points 
where sentences are embedded within others...."^ 



Paul L. Garvin has summarized the "fulcrum" approach of 
the Bunker-Ramo group, saying that the method "... starts out 
with the minimum unit - the morpheme (minimum unit of grama- 
tical form) in straight linguistic analysis, the typographical 
word in language data processing - and considers its gradual 
fus'ion into units of increasingly higher orders of complexity, 
called fusc::i units. A sentence is thus visualized, not as a 
sample succession of linear components, but as a compound chain 
of fused units of different orders of complexity variously en- 
capsulated into each other. Syntactic analysis, including the 
automati c a^nalysi s which a machine translation syntax routine 
must perform, then has as its objective the identification of 

this encapsulation of fused units by a ascertaining their 

8 

boundaries and functions." 

The approach of the Wayne State University group is 
identical in theory to that of Bunker-Ramo, although the method 
of computer implementation used by the Wayne group has developed 
along distinct lines. Certain major aspects of the Wayne 
method are elaborated in the paper which follows. 



-v- 



The following bibliography comprises the documentation 
for the aforementioned summaries of research carried out by 
the indicated machine translation groups. 



1. Symposium on the Current Status of Research (Austin, Texas: 

Linguistic Research Center, University of Texas, 
October, 1963). 

2. Lamb, Sydney M. "On the Mechanization of Syntactic Analysis," 

Readings i n Automati c Lang uage Processing, ed. bv 
David G. Hays (New York: American Elsevier Publishing 
Co. 1966) pp. 149-158. 

3. Sherry, Murray-E. "Comprehensive Report on Predictive 

Syntactic Analysis," NSF-7 , Section I (1961). 

For additional background, see also: 

Oettinger, Anthony G. Automatic Language Translation , 
f^ambridge, Massachusetts: Harvard Univefsitv 
1960). 

4. Hays, David G. Dependency Theory: A Formal ism’ and Some 

Observa ti ons , (Santa Monica, California: RAND Corp., 
RM-4087-PR, July, 1964). 

For additional information, see also: 

Hays, David G. and Ziehe, T. W., Russi an Sentence- 
Structure Determination (Santa Monica, 
California: RAND Corporation, RM-2538, April, 

I 960).. ^ . 

Hays, David G. Grouping and Dependenc y Theories 

(Santa Monica, .California: RAND Corporation, 

1^M 2646, September, 1 960). 

. On^ the Value of ^ Dependency Connection 
(Santa Monica, California: RAND Corporation, 
RM-271 2-AFOSR , January, 1961). 

5. Zarechnak, Michael. "Three Levels of Linguistic Analysis in 

Machine Translation", Journal of the Association for 
Computing Machinery , Volume 6, No. 1, January, 1959. 

6. Macdonald R.R. (ed.) General Report , 1952-1963 (Washington, 

D.C.: Georgetown University Machine Translation 
Research Project, June, 1963). 

7. Final Report on Computer Set AN/GSQ-1 6 (XW-2 ) , Volume II, The 

Linguistic Approach (New York: IBM Corporation, 

Thomas J. Watson Research Center, September, 1963). 












-y1- 



8. Garvin, Paul Li ^ Informal Survey of Modern Lirigufsti cs 

(American Documentation, Volume 16, No. 4, October., 
1965). 

See^ also: 

Garvin,, Paul L. "Syntactic Retriev.al, " Prc ceedings 
of the Nati onaT Symposi um on Machine Trans - 
TTtion ,. edited by H.P. Edmundson (Englewood 
Cliffs , New Jersey: Prentice-Hall, Inc., 
1961)., 

. "Syntax in Machine Translation," Natural 

Language and the Computer , edited by 
P.L. Garvin (New York: McGraw-Hill Book Co., 
1963). 

For a general survey of activities carried out by various MT 
research groups in the area of syntax., see Summary of the 
Proceedings . of the Conference of Federal 1 y Sponsored Machi ne • 
Translation Groups on MT=?Oriented Syntactic Analysis, Machine 
Translation Research Group, Wayne State University, 1962. . 



INTRODUCTORY SUMMARY OF 
RESEARCH IN MACHINE TRANSLATION CONDUCTED AT 
WAYNF STATE UNIVERSITY 

This. is a summary of the research which has been carried 
out at Wayne State University in developing computer programs 

to perform syntactic analysis on Russian sentences. This re- 

✓ 

search is an integral part of the effort to develop a computer- 
_ided procedure wherein high quality translation from Russian 
to English is accomplished through the interaction of man and 
machine. 

A corpus of 15 Russian mathematical articles was selected 
to provide the raw data for experimentation. Each word in the 
corpus^ was entered into a dictionary, along with certain of its 
grammatical properties in coded form, and at least some of its 
English translations. Each word was put into at least one of 
a possible nine syntactical ly based word classes: nominal, predi- 
cative, modifier, infinitive, gerund, adverb, preposition, 
conjunction, declined relative; homographs were put into two, 
three, and even four wo’^d classes and coded for their properties 
in each class. The properties for which each word was coded are 
a function of the word class; the first five mentioned are more 
densely coded than the last four. The classes where taken from 
the Ramo-Wool dri dge classification scheme. The class of 
nominals includes nouns, proper names, and personal - pronouns . 

The class of predicatives comprises ordinary verbs, short form 
adjectives and participles, and modals. The class of modifiers’ 
is made up of adjectives, participles, numerals, and demonstra- 
tive pronouns. The class of adverbs contains particles as well 
as ordinary adverbs. The class of declined relatives is made 
up of those pronouns which can be used to introduce a relative 
clause, e.g.: KOTOPbiM, HEM, KAKOM. 

* 

A homograph is a word which can be assigned to more than 
one word class, e.g. HA^O - 'necessary' - (predicative) and 
-'over' - (preposition). 

**' 

Grammar Code Format and Syntax FI ow Charts , an informal 
collection of material which appeared around 1959. 



. - The corpus and the dictionary were punched onto cards, 
and were later put on tapes. Programs were written to update 
the tapes, as well as to select portions of. the corpus, look 
them up in the dictionary, and format the looked-up tapes. so 
that they could serve as input to the automatic- syntactic 
analysis programs which were run on each sentence of the tape. 

The syntactic routines used on automatic sentence analysis 
are of three types. The first type comprises the blocking 
routines (nominal, prepositional, governing modifier, predicative, 
and gerund) which group immediate constituents of a sentence 
into phrases consisting of a fulcrum word and its dependents. 

The second type comprises the profiling routine which arranges 
the sentence constituents into columns according to their 
expected syntactic function(s) in the sentence. The third type 
(comprising PARSE and HYPERPARSE), using the sentence predi- 
cative as fulcrum, determines the actual’ syntactic roles of 
many of the sentence constituents (all unnested nominal blocks 
and certain unnested prepositional phrases) on the basis of the 
predicative's complementation patterns which are stored in an 
auxiliary dictionary. 

The syntactic routines are continually being revised to 
include improvements brought to light by observing the output 
of various runs. There will .be a saturation point when 
improvements in some areas cause greater difficulties in others. 
HYPERPARSE will have to.be extended to include a greater variety 
of sentence types in its domain of operation, and also to 
identify the roles of more of the sentence components. Problems 
of lexical choice for Russian words which have more than one 
English equivalent will have to be handled, and this will 
necessitate semantic studies. 

The writing of the syntactic routines has been greatly 
facilitated by a system, now known as GAPS, which enables the 
language analyst to write in an interpretive language rather 
than in machine language. 



1 






if 



F ' I 



■ 



O 

ERIC 






-3- 



The arrival of the. IBM System/360 .will necessitate re- 
programming of the entire Wayne State University machine 
translation system, but it is anticipated that the hew system 
will operate much -more efficiently both because of techno- 
logical improvements and because of the incorporation of 
certain valuable hindsights. 



i ^ 

I 






PARSING BY MATRIX* 



A presentation of this research was made at the Fourth 
Annual Meeting of the Association for Machine Translation and 
Computational Linguistics, held at the University of Californi 
Los Angeles, California, August 26-27, 1966. 



- 5 - 



The principal problem treated by the Wayne State University 
machine transl ati on project since its inception has been Russian 
syntax. Syntactic resplution of a sentence is an integral part 
of the process of translating that sentence from the language in 
which it is given toanother language. The purpose of performing 
syntactic analysis is to discover the structure of any given' sen- 
tence, where a sentence is defined as a meaningful sequence o^ 
words (or idioms), formula's^- and punctuation marks, containing 
at least one verb or verb substitute. 

The analysis performed here entails defining various re-" 
lationships among .words and word classes. Routines to seek 
pertinent items are then programmed, so that instances of these 
relationships can be- recognized in given sentences. In large 
part, this is accomplished by utilizing the wealth of morpho- 
logical information about case, number, and gender inherent in 
Russian forms and displayed in the grammar code of the forms. 

The initial relationships are implicitly defined in the 
various bl ocking routines . Each blocking routine is brought 
into operation when an item in a certain word class is discovered. 
This item is the fulcrum of'the block, and the dependents of this 
fulcrum are recognized and included .in the block when they are 
adjacent to the fulcrum or separated from it by certain 
permissible items. 

The broader relationships on the sentence level are 
recognized and marked by either of the two parsing routines, 
both- of which utilize the fulcrum approach with the predicative 
block serving as fulcrum. In these routines, the proximity of 
the subsidiary sentence items to the fulcrum is not of importance. 
All of the candidates for each role which can complement the 
fulcrum are lined up in parallel, reduced logically, and selected 
in series. 






PART I 

THE- BLOCKING ROUTINES* 



it 

These blocking routines grew out of routines developed 
on the basis. of the fulcrum approach by Paul Garvin of the 
Bunker-Ramo Corporation, Canoga Park, California, In Grammar 
Code Format and Syntax Flow Charts , an Informal collection 
of material which appeared around 1959, 



- 7 - 



Certain elementary relationships, which are difficult to 
express explicitly, are ' impl i ci tly defined by the syntactic 
blocking routines designed in the project. There are five 
su.ch routines, executed in the following order: 

NBR (nominal blocking routine) 

PBR (prepositional blocking routine) 

GMBR (governing modifier blocking routine) ' 

VBR (predicative blocking routine) 

GBR (gerund blocking routine) 

Essentially, each routine first seeks the fulcrum element 
(always an item in the word class or subclass for which the 
routine is named), and, having found it, attempts, to tncludo 
adjacent items which depend on the fulcrum or on the dependents 
of the fulcrum, as well as items connecting the dependents. 

Prepositional blocks may contain nested nominal blocks as 
well as nested prepositional blocks. Governing modifier bloc*'? 
and gerund blocks may contain nested nomi nal blocks and/or 
nested prepositional blocks. Nominal blocks and predicative 
blocks have no nested blocks at present. The concept of 
nesting may be illustrated in the fbllovnng example: 



In the expression AHAJiorkNHbiPI B HAUIEM CMbICJlE PABEHCTBY - 
anal^ous in our sense to the equality' -, the nominal block (NB) 
HAlUEK CMbICJlE - 'pur sense' - is nested in the' prepositional 
block (PB) B HAUIEM CMbICJlE - 'in our sense' which in turn is 
nested in the goverhi ng. modi f ier block (GMB) AHAJlorMMHbiPI B HAUIEM’ 
CMbICJlE PABEHCTBy. 

The structure of this governing modifier block may be 
illustrated as follows: 



NB 



NR 



AHAJlOrMMHblf^ B HAUIEM CMbICJlE PABEHCTBY 



PB 



GMB 









•raMM 



- 8 - 



fhe blocking routines., as they are presently formulated, 
produce' blocks composed: only of continuou.s segments of text. 

The system w;i:ll eventually have to be expanded, so that phrases 
which are discontinuous (for example, nycJb X BbinoJiHflET - 'let 
X fulfill' - and n03BOJ151IOT, B MACTHOCTM, C OBmEPi TOMKM 3PEHM51 
OCMblCJlufb - 'permit, in particular, from a general point of 
view to iaterpret' -) can be properly identified. 

A description of each of the types of block appears on 
the fol lowing pages . 



- 9 - 



j 



1) NOMINAL BLOCKING ROUTINE 

The nominal blocking routine .scans the sentence from 
left to right until a nominal (noun) is found. The routine 
then groups the nominal with all of its preceding modifiers 
(i.e., adjectives, participles, numerals, and certain pronouns), 
including adverbs modifying the modifiers. The modifiers may 
be in simple agreemeht with the nominal; they may be in abnormal 
agreement, as in the case of numerals greater than one and words 
like MHOro - 'much, many' -, HECKOJibKO - 'several'-; they may be 
in extended agreement where two or more singular modifiers 
modify a plural nominal; HEPBA^i M BTOPA5I. KHHTM - 'first and 
second books' -, or one or more plural modifiers modify two or 
more nominals, the first of which is singular: XOPOUJME KHHTA 
M KAPAH/lAUJ - 'good book and pencil* In the last case, the 

secondnominal is included in the block. 

Those adverbs which are interspersed among a series of 
modifiers which belong to a nominal are construed to belong 
to the' modi fiers and hence to the block. Under certain condi- 
tions, adverbs to the left of the leftmost member of such 
a series of modifiers are'included in the block. 

Thus, a nominal block is created whose agreement code is 
that of the nominal, with possible reduction of ambiguity on 
the basis of the modifiers, or with nominative and/or accusative 
agreement, bi ts if there is at least one modifier requiring 
abnormal agreement-. The government code of the block is that 
ofthelastnominal. 

This routine canmake the error of linking a governing 
modifier, which is in the case that it governs, with the following 
(governed) nominal, to produce a nominal block instead of the 
correct governing modifier block: MMEIOIUME HEnPEPbIBHblE nPOM3BO/lHbiE 
- 'having continuous derivatives' -. When a list of modifiers 
which must be complemented (e.g. MMEIOUlMPl - 'having' -) is compiled 
and the dictionary entries are coded, then the nominal blocking 






- 10 - 



routine could test for this property, and not combine such 
a modifier with a nominal into a nominal block. This would 
reduce the frequency of such error. 

After .creating the nominal blocks, the routine seeks two 
.nominal blocks, having identical agreement codes, which are 
on opposite sides of a coordinating conjunction. If this is 
found, the two are combined into one nominal block whose agree- 
ment code has the same case(s) as the original ones, but in the 
plural. For example, TEOPEMA kl 4>y.HK14kI5l - 'theorem and function' -, 
where each noun is feminine nominative singular, becomes a block 
in nominative plural. (In the exam,ple: TAK CKA3AJ1 rEJlb<|)AH/l kl lUkinOB 
COrJlACklflC^I - 'So said Gel'fand and Shilov agreed' -, the combining 
of rEJl<|)bAH;i kl lUklJlOB - 'Gel'fand and Shilov' -, where both names 
have identical agreement codes, is incorrect, but it is presumed 
that such cases are rare.) 

There is a question of whether to reduce the requirement 
that the agreement codes be identical to the requirement that 
they have non-zero intersection, but this question has not yet 
been resolved-. There is an instance in the corpus (Article V, 
p. 1, s. 18) where, K.kl. BABEHKO kl r.E. lUklJlOBblM - 'K.I. Babenko 
and.G.E. Shilov' - is, not combined, because the agreement code 
of K.kl. BABEHKO - 'K.I.. Babenko' - includes more cases (since 
BABEHKO - 'Babenko' - i-s undeclined) than does the agreement 
code of r.E. UlkinOBblM - 'G.’E. Shilov' - which is instrumental only. 
-However,, if the criterion of combining is reduced to "non-zero 
i.nters.ecti on of agreement codes", there is the danger that 
TEOPkIkI kl CkICTEMbI - ' theory/ theori es ' and 'system/systems' - 
would be blocked in the following context: 3T0 YKA3AHO B TEOPkikI 
(rEJlb<l>AH/IA) kl CkICTEMbI, TAKkIM 0BPA30M, kIMEIOT CklJlY . , - 'This is 
proved in the theory (of Gel'fand) and the systems thus hold.' 

Here, TEOPkikI - 'theory' - functions as locative singular, while 
CkICTEMbI - 'systems' - functions as nominative plural; the inter- 
section of agreement codes, however, is genitive singular - 
nominative plural - accusative plural. In any event, combining 
the bl o.cks prevents the separate roles of the two blocks from 
being distinguished. 









>atini*ir»a 



-n- 



■2) PREPOSITIONAL BLOCKING ROUTINE 

The prepositional blocking routine scans the s.entence from 
right to left until a preposition is found. Then, skipping 
only adverbs and/or nested prepositional blocks, it blocks the 
preposition with the following a) nominal block, b) (unblocked) 
modifier, or c) declined relative, provided that the government * 
code of the preposition has positive intersection with the 
agreement code of a), b), or c) , If a declined relative is the 
object of the preposition, the block is specially marked. 




- 12 - 



3) GOVERNING MODIFIER. BLOCKING ROUTINE 



The governing modifier blo.ck.ing routine scans the sentence 
from left to right until an unblocked governing modifier is 
found. The routine then blocks the governing modifier with: 

1) governed prepositional blocks 

2) governed nominal blocks 

3) nominal blocks which agree with the governing modifier 

The following structures are allowed to intervene between 
the abovementioned three.: 



a) adverbs 

b) any nominal block which can be construed as an 
adjunct to either the last nominal in the preceding 
prepositional block or the last nominal in the 
preceding nonrinal block, and (potentially) 
instrumental blocks 

c) ungoverned prepositional blocks. 



The routine marks nested nominal blocks as to whether 
they are governed by the governing modifier and/or agree with 
the governing modifier and/or are adjuncts to a preceding 
nominal block. It also marks prepositional phrases which are 
governed . 

In the example HE yflOBJIETBOPflET TPE5QBAHM5IM, OBECnEMMBAIOmMM 
PA3PEiUMM.OC Tb MHTEPnoJiflUMOHHOt^ 3A.HAMM - 'does not satisfy the 
regui rsm ent , ensuring sol vabi-l i ty of the interpol a ti on probl em * 
since the governing modifier governs the accusative and the dative 
cases, PA3PEUlMMOeTb - 'solvability' - is marked as governed 
(since it is nom /acc ) and MHTEPnojiRUHOHHOPI 3A/1AMM - 'interpolation 
problem' - is marked as* an adjunct (since it is gen). In the 
governing modifier block section of the phrase nPM KAKMX 
jlOnOJIHMTEJIbHUX yCJIOBMflX , HAJIAPAEMHX HA^Kyj - 'under what 
s.uppl ementary conditions , imposed on n / ' - the prepositional 
phrase HA {Xr\} ~ 'on {A n} ‘ - is marked as a governed preposi- 
tional block. 



-13.- 



The markings on the nominal blocks constitute a matrix 
from which it may sometimes be possible to determine whether 
the governing modifier block is functioning as a nominal block 
or as a phrase modifying a nominal block which is outside of 
its boundaries. If a block is marked in position G when it is 
governed, in position F when it agrees with the governing 
modifier, and inposition A when it is an adjunct, then a nominal 

block in a governing modifier block may have one of the following 
vectors : 

VECTOR G F 

a) 0 0 

b) 0 0 

c) 0 1 

d) 1 0 

e) 0 1 

f) 1 0 

g) 1 1 

h) 1 1 

The vectors associated with all of the nominal blocks in 
a governing modifier block (except those which are nested in 
prepositional phrases) form an n x 3 matrix, where n is the 
number of nominal blocks in the governing modifier block. If 
there is exacly one row of type c), the governing modifier 
block must be made a nominal block, since it contains the 
agreeing nominal block. The "F" column may then be zeroed out 
in the rest of the matrix, and the role ambiguity of the 
remaining nominal blocks may be reduced. If there is exactly 
one row of type d), the "G" column may then be zeroed out in 
the rest of the matrix, and a vector of type c) may be sought. 
(This, of course, holds only when exactly one case is governed.) 

It will be necessary to investigate additional reduction schemes, 
so that the governing modifiers blocks which are really nominal 
blocks, syntactically speaking, may be identified and properly 
processed. 



A MEANING 



0 instrumental (where instrumental 

case is not governed, and governing 
modifier is not in instrumental case) 

1 adjunct block 

0 agreeing (fulcrum) block 

0 governed block 

1 adjunct v agreeing block 

1 adjunct v governed block 

0 agreeing v governed block 

1 adjunct v agreeing v governed block 



- 14 - 



Modifiers which are marked with the governing bit, but 
which have no marking for specific governed structures, are 
allowed. This provision was made because of several . examples 
in text where an ordinary modifier was followed by qualifying 
phrases, e.g. nAPABOJiMMECKMX B CMbicJiE nETPOBCKoro cmctem - 
'parabolic in the sense of Petrovsky systems' In order to 
connect nAPABOJiMMECKMX - 'parabolic' - with CMCTEM - 'systems' - 
it is necessary to combine the prepositional phrase B CMbiCJiE 
nETPOBCKOro - 'in the sense of Petrovsky' -, with the modifier 
nAPABOJiMMECKMX. 

The phrase HEOTPMUATEJlbHA5i HEnPEPblBHA5i HA CEfMEHTE [formula] 
<l>yHKUM5l - 'negative continuous on the segment [formula] function' -, 
presents an additional problem, namely that of picking up a 
sequence of modifiers in a suppl emenatry nominal blocking routine 
to be executed after governing modifier blocking. 

One may speculate about the wisdom of subjecting all unblocked 
modifiers to the governing modifier routine. This, of course 
would lead to some incorrect results, for example, in the 
sentence nPEJlCTABJ151ETC51 OMEHb UEHHbIM B MCCJ1EJ10BAHM51X rEJlb<l>AHflA 
M UlMJlOBA HE TOJlbKO BBEJJEHME ..., - ' N’ot only the introduction ... 
is very valuable in the investigations of Gelfand and Shilov' -. 
Here, the instrumental modifier serves as the complement of the 
verb rather than a governing modifier, so that if it were 
classified as a governing modifier, a search for the agreeing 
nominal would lead to error, since such a nominal does not exist. 



- 15 - 



4) PREDICATIVE BLOCKING ROUTINE 

The predicative blocking routine scans the sentence from 
left to right until a predicative (finite verb, short form 
modifier, modal, or special verb form) is found. Then, after 
searching to the left of the predicative for the negative 
particle "HE" (skipping adverbs) and including “he" as the left 
boundary of the block (if "HE" is found), the' routine proceeds 
to the right of the predicative, identifying and including any 
temporal auxiliaries and/or infinitive complements found. The 
agreement code of the block is usually the agreement code of 
the first predicative, and the government code is that of the 
last item of the block. 









- 16 ^ 



5) GERUND BLOCKING ROUTINE 

The gerund blocking routine scans the sentence from right 
to left until a gerund is found. The routine then blocks the 
gerund with:. 

T) governed prepositional blocks 

2) governed nominal blocks 

3) nominal blocks which agree with the governing 
modifier. 

The following structures are allowed to intervene between 
the abovementi oned three: 

a) adverbs 

b) any nominal block which can be construed as an 
adjunct to either the last nominal in the preceding 
prepositional block or the last nominal in the 
preceding nominal block, and (potentially) 
instrumental blocks 

c) ungbverned preposi ti onal - bl ocks . 






r 1 8 • 



aZ». 




Once a sentence is blocked, the next step in discovering 
the structure of the sentence is to analyze it from the stand- 
point of its fulcrum, the predicative block. In order to do 
this, it is necessary to put each sentence component, i.e., 
each block or individual unblocked item, into a list or matrix, 
according to the potential role(s) of that component in the 
sentenced This procedure is accomplished by the syntactic 
routine, PROFILE. 

Initially, PROFILE makes the following assignments: 



SENTENCE COMPONENT 


COLUMN 


* 

a) unnested nominal blocks in the 




i) nominative case 


COL I 


ii) governed case (under some conditions. 




see below) 


COL II 


iii) nominative, genitive, dative. 




accusative, instrumental cases 


COL III 


b) predicative blocks 


PROFILE 


”c) governing modifier blocks 


DUMP 


d) unblocked infinitives 


PROFILE 


e) gerund blocks 


DUMP 


f) unblocked adverbs 


DUMP 


g) unnested prepositional blocks 




i) with declined relative object 


PROFILE 


i i ) other 


PREP 


h) unblocked conjunctions (including most 




punctuation) 


PROFILE 


i) expressions in parentheses 


DUMP 


' j ) declined relatives 


PROFILE 



NOTE: An unblocked modifier presently leads to an error condition. 

*Since a nominal block may be in more than one case, it 
may be entered in more than one column. For example, the noun 
HOMM - 'night/nights’ - may be genitive, dative or locative, 
singular or nominative or accusative plural, and may there- 
fore be entered both in COL I and COL III. 









- 19 - 



It can be seen that the components which are entered into 
PROFLLE are. for the most part, clause determiners. Those which 
are entered into DUMP are not part of the essential structure of 
the sentence. Unnested nominal blocks, which are in COL I and 
COL MI. can have the following roles in a sentence: 

i) subject 



ii) governed predicative complement 

iii) adjunct, where adjunct is presently taken to mean 
a) a dependent of a preceding nominal block 



b) an adverbial expression in the instrumental case 
e.g. TAKMM 0BPA30M - 'thus' -. or the accusative’ 
(of time) cas.e. e.g. BCIO HOMb - 'all night' -. 

iv) appositive (This role will be ignored in what follows 



) 




! 

f 



4 

? 

? 

I 

\ 

\ 

I 

I 

I 



The preposi tio.nal blocks, which are in PREP, can have the 
following roles in a sentence: 

i) governed predicative complement 

ii) adverbial expression 

iii) adjunct to a preceding nominal block 

(This role will be ignored in what follows.) 









- 20 - 



After ereatingi the columns., the routine proceeds to test 
the column entitled •PROFILE in order to ascertain the number 
of predicative blocks in the sentence. At present, further 
analysis is done only on those sentences having at most one 
predicative block. (Sentences having no predicative block are 
treated as if they had a verb governing the nominative case 
only.) If the sentence has exactly one predicative block, the 
block is tested for its government properti.es. If the block 
governs at ’most one case and nothing else, then the routine 
creates an additional column, COL II, consisting of ail unnested 
nominaii blocks in the governed case, and calls on the syntactic 
routine PARSE for further analysis. If the block ;has other 
government properties, then the syntactic routine HYPERPARSE 
is called upon for further analysis. PARSE and HYPERPARSE are 
described in the following pages. 



- • 



4 




- 22 - 



If we consider only simple sentences having a predicative 
block which governs at most one case and no. preposition (in 
which case unnested prepositional blocks will be regarded as 
adverbial), it is possible to determine a*l 1 possible inter-, 
pretations o%.a sentence, by taking into account all possible 
roles of each unnested nominal block in that sentence. When 
each nominal block has been assigned one of the three roles: 
subject governed predicative complement, or adjunct, then 
the sentence is said to be 'parsed', and the assignment is 
called a parsi ng . 

The technique used on this project for. finding all possible 
parsings of a simple sentence' havi ng a predicative block with 
restricted government is to create , modify , and analyze an 
n X 3 matrix, where n is the number of unnested nominal blocks 
in the sentence, and 3 is the number of roles which can presently 
be assigned to a nominal block. 

1) The creation of the matrix is accomplished by the syn- 
tactic routine PROFILE. COL I contains the potential subjects, 
i.e., nominal blocks in the nominative case. COL II contains 
the potential governed predicative complements, i.e., nominal 
blocks in the governed case.’ COL III contains the potential 
adjuncts, i.e., nominal blocks in the cases other than locative. 

A given nominal block can be entered in as many columns as its 
agreement code will allow. 



, 4 ^ 






‘Huffifitiiiiiir 



tilt «L.i, 



T 2 3 *• 



Once the matrix is created by PROFILE, the modification 

and analysis of the matrix are accomplished by the syntactic 
routine PARSE. 

2) The modi f i ca ti on is carried out in two stages: 

I. Certain grammatical considerations sometimes make it possible 
to remove enfries from COL I and COL III. 

A) Col I can be reduced by removing all nominal blocks 
which do not have the correct number, person, or 
gender to be the subject of the preLicative block of 
the sentence. Although, HOMM - 'night/nights’ - has 
nominative plural bits, it cannot be the subject in 
a sentence where /lAET - 'gives' - is the predicate, 
since /lAET is singular, (possible cause of error: 
compound subject, e.g. MBAH M KAT5] M/iyi - 'John and 
— Kathy are going' -) 

fe) COL III can be reduced by removing every potential 

adjunct which is not in the instrumental case (or accu- 
sative of time) or is not immediately preceded by a 
nominal block which governs it. (possible cause of 
error: governor of adjunct does not immediately precede 
adjunct, e.g., nPOBJlEMA KOlilM E/JMHCTBEHHOCTM - 'Cauchy's 
problem of uniqueness' -) 

II. After COL I and COL III are reduced through grammatical 
considerations, it is possible to reduce COL I and COL II 
using logical considerations, provided that the following 
condition is made: In any given parsing, at most one element 
(nominal bT,ock);may be assigned the role of subject, and at 
most one element may be assigned the role of governed predi- 
cative complement. (This means that sentences having 
compound subjects and/dr compound governed predicative 
complements, e.g., TEOPEMA rEJlb<l>AH/lA M TEOPEMbI lilMJlOBA 
nPE/iCTABJl5]IOTC5] - 'The theorem of Gel'fand and the theorems 
of Shilov are presented' -, cannot be properly parsed. 
Sentences with compound subjects and/or compound predicative 
complements can be parsed when the components of the 
compound block are combined into one block by the NBR, which 
does so under special circumstances.) 



24 - 



The logical reduction takes place as follows: If there is 
a nominal block, N, in COL I (or COL II), such that N is not 
entered in any other column, then all other entries in COL I 
(or COL ir) must be erased. 

The reason is: 

a) N must be assigned a role 

b) N can only be assigned the role associated with 
COL I (or COL II), 

c) the role associated with COL I (or COL II) can.be 
' assigned to at most one nominal block . . 

hence N is the only nominal block which can be assigned 
the role associated with COL I (or COL IT), and, 
in fact, N must always be assigned that role, 

NOTE: If there are two such Ns in a column, an error message 
is written. 

In any reduction step, it is imperative to avoid o.bl itera- 
ting any nominal block, i, e, erasing it from some column when 
it is not entered in any other column, for a parsing must assign 
a role to each unnested nominal block. In PARSE, the oblitera- 
tion of a nominal block in either stage of the -modification 
portion of the routine causes an error message to be written, 
and the parsing to be discontinued. 



o 



3. The analysis portion of the PARSE routine consists of 
finding all possible parsings, i.e., all possible ways of 
choosing at most one nominal block from COL I and at most one 
nominal block from COL II and as many nominal blocks as nec-‘ 
essary from COL ill so that exactly one assignment is made 
for each row in the matrix. 



Consider the following matrix: 





COL I 


COL II 


COL III 


A) first unnested nominal block 


X 




X 


B) second unnested nominal block 




X 


X 


C) third unnested nominal block 


X 


X 


X 



The set of parsings, where each matrix element is represented 
by its row (A, B, C) and column (I, II, III), is: 



PARSING 


SUBJECT 


OBJECT 


ADJUNCT(S) 


1 


A, I 


B,HI 


C,III 


2 


■' A, I 


C,II 


B,III 


3 


c,i 


B,II 


A, III 


4 


A, I 


NONE 


B,III; C,III 


'5 


c,i 


NONE 


A, III; B,III 


6 


NONE 


B,II 


A, III; C,III 


7 


NONE 


C,II 


A, III; B,III 


8 


NONE 


NONE 


A, III; B,III; C,III 



NOTE: The above list of parsings il lustrates a feature of the 

PARSE routine: namely, that the absence of the subject 
and/or of the governed item is allowed, as long as the 
parsing utilizes all unnested nominal blocks in the 
sentence. 



The parsing scheme just described was satisfactory for 
sentences contaiining predicatives with a Timited simple type 
of government. Before proceeding to the description of the 
more general parsing routine, HYPERPARSE, it is valuable to 
consider the more complex types of predicative government. 






fllHiiiiiHI 



mirnka^^ 



tC^UMnCim 






PART IV 

PREDICATIVE GOVERNMENT* 



^ '. * 

For an excellent description of complementation, see 
Andrew S. Kozak, Complementatio n in Russian: Theory and 
Application , Memorandum RM-4582-PR (The RAND Corporation, 
Santa Monica, California), September, 1965. 






I 




? 

i 






f 



1 






-1 



V 



i 



i 



I 



1 






i 



V 



Predicative government is a term which can roughly be 
defined as the property possessed by a . predi cati ve of requiring 
complementation by one or more of various structures. Comple- 
mentation is distinct from modification, and the distinction 
lies in whether a structure which is a candidate for comple-r 
mentation can be used (almost) universally (in which case it 
is a modifying structure) or is peculiar either to the predi- 
cative being studied or to a proper subset of all predicatives. 

The types of structure which can serve as complements to 
a predicative are: 

1) nominal blocks (or modifiers) in 

(N) nominative case 
(G) genitive case 
(D) dative case 
(A) accusative case 
(I) instrumental case 

2) prepositional blocks 

(the set of prepositions which are 
actually in complementary structures 
h‘as not yet been defined) 

3) MTO and MTOBb! phrases 

4) infinitives 

5) - .KAK phrases 

6) adverbs 

Infinitive government is handled in the VBR when the 
infinitive is contiguous to the predicative (allowing intervening 
adverbs). Types 5). and 6) have not yet been studied extensively. 

Let US consider only case and prepositional complementation. 
^lEJlATb - 'to do' - can govern accusative (A), dative (D), 
instrumental (I)i and M3 - 'out of + genitive (G). This alone 
does not specify which combinations of these governed items can 
occur. It happens that the accusative can occur alone, with 
the instrumental or with the dative, or with M3 + genitive. 

Also, the accusative can occur with the dative and M3 + genitive. 
This information gives five valid complem.entation patterns which 
are represented and exemplified as follows: 

*(L) locative case is not a predicative complementation case. 



-29- 



1) A OH ^lEJlAET rOPillOK. - 

'He is maki ng a pot . ' 

2) D and A OH HAM ^lEJlAET HOBOE OPE^IJIOXEHME . - 

'He is making us a new offer.' 



3) A and I 3TO ^lEJlAET OHHY OrHEYnOPHO^. - 

'This makes the clay fireproof.' 

4) A and M3 + G OH ^lEJlAET M3 TJlMHb! TOPUIOK. - 

'He is making a not out of clay.' 



5) D and A and M3 + G OH HAM jqEJlAET M3 TJlMHb! rOPIUOK. - 

'He is making us a pot out of clay.' 



(Note that a total of 16 patterns: 




can 



be obtained from all combinations of the four governed structures; 
the enumeration of the valid ones reduces this number to five.) 

Predicative government in the original MT dictionary of 
this project was coded in such a way that all complements, for 
a given predicative are indicated in summary only, without any 
indication of which combinations of these governed structures 
can actually occur. When considering predicative blocks with 
extremely simple government codes (i.e., at most one case 
governed, and no prepositions governed), the predicative government 
coding of the original MT dictionary is usable. However, in 
order to parse sentences having a predicative block with more 
complex complementation patterns, more comprehensive input 
information about these patterns is needed. 

■Predicative government input information now consists of 
the enumeration of each combination in which the governed 
structures (nomi nal blocks in a certain case prepositional blocks 
where the preposition and the case it governs are specified, and, 
for later use, clauses introduced by MTO - 'that' - and MTOBbi 
- 'that' - and phrases introduced by KAK - 'as' -) can occur. 

Each possible combination (including information about whether 
a subject can occur) is called a pattern , and the totality of 
patterns for a given predicative is called the pattern set for 
that predicative. The pattern set for each predicative in the 
corpus is stored in an auxiliary dictionary. 








I 









I 



1 



i 













-30- 

Following are the coding instructions used in the creation 
of the auxiliary dictionary. The instructions are followed by 
sample coding forms for the patterns of /lEJiATb - ‘to do' 

These forms are kept in a permanent reference file for language 
example documentation of each pattern. 



J 



3 



i 



\ 






'i 

•4 






J 




3 

ic 






-31- 



C£diji£ Instructions for VerbaT Government Patterns 

FIRST (CANONICAL FORH) CARD: 
cols 1- 6 ID NUMBER 





Write the four-digit identification number 
assigned to the canonical form. The fifth 
•and sixth digits, entitled 'XTRA', are used 

only for inserting a form. Otherwise they 
are blank. .. 


cols 7-12 


blanks 


cols 13-42 


RUSSIAN CANONICAL FORM 


cols 43-80 


Write the infinitive form of the verb, or 
the neuter form of the short form modifier 
for which the patterns apply, in case a 
certain form of the set represented by the 
infinitive or the neuter short form has a 
special set of patterns, write that form 
and code its patterns separately, 

blanks 



NEXT (PATTERN) CARDS: 



cols 1 - 6 


ID NUMBER 

Write the four-digit identification number 
assigned to the canonical form. The fifth 
and sixth digits, entitled 'XTRA', are used 
only for inserting a form. Otherwise they 
are blank. 


cols 7 


P 


cols 8-10 


PATTERN # 

Write the two digit number assigned to the 
pattern being coded. The third digit, 
entitled 'XTRA', is used only for inserting 
•a pattern. Otherwise it is blank. 


cols 11-12 


blanks 


cols 13 


SUBJECT 

If the third person singular or the neuter 
form associated with the canonical form 
must be impersonal in the pattern being 
coded, code 'N' in the square alloted: 
otherwise code 'S'. 



iU 



- 32 ^ 



cols 14-17 CASE GOVERNMENT 

For each case governed by the verb in the 
pattern being coded, write the appropriate 
one-digit number, beginning in the first 
square of the field. Unused squares are 
to be left blank. See-table entitled 'CASE 
GOVERNMENT CODES'. 



cols 1-8-25 PREPOSITION GOVERNMENT 

For each preposition + case pair governed 
by the. verb in the pattern being coded, 
write the appropriate two-digit number, 
beginning in the first two squares of the 

field. Unused squares are to be left 

blank. See table 'PREPOSITION + CASE 
GOVERNMENT CODES'. 



cols 26,27. MTO/MTOBbI CLAUSE GOVERNMENT 

If a MTO or MTOBbI clause is governed in 
the pattern being coded, write a 1 in the 
appropriate square. 

cols 28. KAK PHRASE GOVERNMENT 



If a KAK phrase is governed in the pattern 
being coded, write a 1 in the square. 

cols 29 INFINITIVE GOVERNMENT 



If the verb governs an infinitive in the 
pattern being coded, then code a 1 in the 
square. 



cols 30-80 blanks 



Write a Russian example with an English translation to 
illustrate the pattern coded above. 



i' 



r 



■'i 



iERlC 









- 33 - 



CASE GOVERNMENT CODES 



NO 

1 

2 

3 

4 

5 



CASE 

nominative case 
genitive case 
dative case 
accusative case • 
instrumental case 












■i.'nTSi'atiiiiiiitMi'bi 






iwiaiiiifi1hi>1itiiliiftifci1 






-34- 





AUXILIARY DICTIONARY 


PREPOSITION AND 


> 

~ 


m. 


PREP 


CASE 


'■ 

<,] 


01 


BO 


+ 


acc. 




02 


o 

GO 

GO 


+ 


1 oc. 




03 


HA 


+ 


acc. 




0,4 


HA 


+ 


1 oc. 


- ^ 


05 


0 , OE, OBO 


+ 


^cc. 




06 


0^ OB, OBO 


+ 


loc. 


V V.' 


07 


3A 


+ 


acc. 




08 


3A . 


+ ■ 


i n s t r . 


■ 


09 


no/i, no^io 


+ 


acc. 




10 


no/i, np;io 


+ 


i n s t . 




11 


no 


+ 


acc. 




12 


no 


+ 


dat . 




13 


no 


+ 


1 oc. 




14 


C, CO 


+ 


acc. 




15 


C, CO 


+ 


gen. 




16 


C, CO 


+ 


i n s t r . 




17 


^.0 


+ 


gen. 


18 

5 . 19 

1 


M3 


+ 


gen. 


OT, OTO 


+ 


gen. 


. I 20 

1 


npo 


.+ 


acc. 




i 


K, KO 


+ 


dat. 


; 


j 22 

1 .23 


HAJ\, HA^O 


+ 


i nstr. 


) 


nPM 


+ 


1 oc. 




1 24 

j 


MEPE3 


+ 


acc. 




i 2.5 


ME)K/iy 


+ 


gen. 




i 

1 26 

i 


MEX/iy 


+ 


i n s t r . 




27 


M3-3A 


+ 


gen. 




28 


M3-no/i, M3-no/io 


+ 


gen. 




29 


J\s\^ 


+ 


gen. 




30 


y 


+ 


gen. 


. \ 


31 


UEPEJ\, nEPE^O 


+ 


i nstr. 




^ ■ 32 

1 


cPEm 


+ 


gen. 




1 ' 33 


nPOTMB 


+ 


gen. 




1 34 


nyjEM 


+ 


gen. 




i 


BMECTO 


+ 


gen. 




36 


MMMO 


■ + 


gen. 



er|c 



* *»*»»■ *| / | »|>I > 



-35- 



col 




RUSSIAN (CANONICAL) FORM (cols 13-42, first card) 



col 

7 8 9 1 0 1 3 14 1 5 1 6 1 7 18 19 20 21 22 23 24 25 26 27 28 29 



p 






- 
























* 








1 




PAT- 
TERN- # 


X 

T 

R 

-L. 


S 

U 

B 


CASE 

GOVERNMENT 


PREPOSITIONAL 


. GOVER 


NMEI 


NT 


^TO 


4T0- 

Ebl 


KAK 


INF 



EXAMPLE: 

RUSSIAN 



• ENGLISH 



-36- 



1 



col 



1 2 3 4 5 6 














ID NUMBER 


X 

T 

1 

R 

A 


X 

T 

R 

A 



;iE.nATb 



RUSSIAN (CANONICAL) FORM 



(cols 13-42, first card) 



col 

7 3 9 10 1:3 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 



P 


0 


1 




S 


4 
























i 






4 

t 




PAT- 
TERN # 


X 

T 

R 

-A_ 


S 

U 

B 


CASE 

GOVERNMENT 


PREPOSITIONAL GOVERNMENT 


^TO 


4TO* 

5bl 


KAK 


INF 



EXAMPLE 

RUSSIAN 



OH ^EJIAET rOPMIOK. 



ENGLISH 



He is making a pot 



o 

ERIC 



-37- 



col 

1 2 3 4 5 6 




TlEJIATb 



RUSSIAN (CANONICAL) FORM (cols 13-42,. first card) 

col 

7 8 9 10 13 14 15 1 6 17 18 19 20 21 22 23 24 25 26 27 28 - 29 



p 


0 


2 




S 


3 


4 




























i 




PAT- 
TERN # 


X 

T 

R 

-A_ 


S 

U 

B 


CASE 

GOVERNMENT 


PREPOSITIONAL GOVERNMENT 


^TO 


^TO> 

5bl 


KAK 


INF 



EXAMPLE: 

RUSSIAN OH ham HEJIAET HOBOE nPEjUIOXEHME. 



ENGLISH 



He is making us a new offer. 



-38- 



- col 



1 


2 


3 


4 


5 


6 


















' 




X 


X 


ID NUMBER 


T 

R 


T 

R 










A 


A 



j^EJIATb 

RUSSIAN (CANONICAL) FORM 



(cols 13-42i first card) 



col 



r 


o 

0 


3 




S 


4 


5 
































PAT- 
TERN # 


X 

T 

R 

-A_ 


S 

U 

B 


GO 


CA: 

iVERI 


SE 

NMENT 


PREPOSITIONAL GOVERNMENT 


AJO 


4TO- 

5bl 


KAK 


INF 



EXAMPLE: 



RUSSIAN 



3TO flEJIAET rJIMHy OrHEVnOPHOP^ . 



ENGLISH This makes the clav fireproof^ 



4 



/ 



1 ERIC 



-39- 



col 

1 2 3 4 5 















• 

ID NUMBER 


X 

T 

R 

A 


X 

T 

R 

A 



/lEJIATb 



RUSSIAN (CANONICAL) FORM . (cols 13-42, first card) 

col 

7 8 9 10 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 



P 


0 


4 




S 


4 








1 

5 


8 




















* 




pat- 
tern # 


X 

T 

R 

_A_ 


S 

u 

B 


CASE 

GOVERNMENT 


PREPOSITIONAL GOVERNMENT 


^TO 


4TO> 

Ebi. 


KAK 


INF 



EXAMPLE: 

RUSSIAN OH ilEJIAET M3 rJIMHbl rOPUJOK. 



ENGLISH 



He Is making a pot out of clay. 



-40- 



col 

1 2 3 4 5 6 









1 






ID NUMBER 


X 

R 

A 


X 

T 

R 

A 



JlEJlATb 



RUSSIAN (CANONICAL) FORM 



(cols 13-42, first card) 



col 



7 8 91013 1415 16 17 18 19 20 21 22 23 24 25 26 27 28 29 



1 • 

\ r 


0 


5 




S 


3 


4 


• 




1 


8 


















• 


f 


« 

? ’ 
V 


PAT 




X 

T 


S 

U 

B 




CASE 






















^TO* 






1 

1 


TERN # 


1 

R 

A 


GOVERNMENT 


PREPOSITIONAL GOVERNMENT 


^TO 


5bl 


KAK 


INF 



EXAMPLE: 

RUSSIAN 



OH HAM ^EJIAET M3 rjlMHb! TOPUJOK. 



ENGLISH He is making us a pot out of clav: 

'i 



f 



In the routine called HYPERPARSE, the set of government 
patterns for the predicative block is looked up in the auxi- 
liary dictionary. Each pattern in the set is compared with 
the nominal and preposition blocks in the sentence in an 
attempt to f^ind all possible realizations of the pattern in 
that sentence. As in PARSE, sentences containing uncombined 
compounds of nominal blocks (and sentences containing nouns 
in apposition) will either be parsed incorrectly or else will 
not be parsed for the reason given in an appropriate error 
message printed out with the sentence. 

This section contains a description of how a blocked, 
profiled sentence is treated by HYPERPARSE. The sentence 
3TA TEOPEMA ^IBOPICTBEHHOCTM n03B0J151ET ^JOKASATb JlEMMbI TOJlbKO 
B 3TOM cjiyMAE. - ‘This theorem of duality allows (one) to 
prove the lemmas only in this case.’ - is used to illustrate 
the process. 



r43r 



The blocks in the sentence are; 
NOMINAL BLOCKS: 



3TA TEOPEMA 
.^BOPlCTBEHHOCTM 
JlEMMbI 

3TOM CJIYMAE 



# of NB 


cases of 
si ngul ar/plural 


word # of 
LB* 


word # of 
RB** 


1 


N/0 


1 


2 


2 


G,D,L/N,A . 


3 


3 


3 


G/N,A 


6 


6 . 


4 


L/0 


9 


10 



PREPOSITIONAL BLOCKS: 



word # of LB* 


word # of RB**- 


8 


10 



B 3TOM CJIYMAE 



GOVERNING MODIFIER BLOCKS: none 



PREDICATIVE BLOCKS: 



word # of LB* 


word # of RB** 


4 


5 



n03B0Jl^ET /lOKASATb 



GERUND-BLOCKS: none 



*LB = left boundary 









.damn 



-44- 



\ 

The profile of the sentence, as composed by the PROFILE 
routine, is represented in the printout as follows: 



WORD # 


RUSSIAN/ENGliSH 


PROFILE 


COL I 


COL II 


:C04.. Ill 


PREP 


DUMP 


• 1 ■ . 


3TA 

THIS 




b; 




B 






2 


TEOPEMA 

THEOREM 




E 




E 


• 1 




3 


^IBOMCTBEHHOCTM 

DUALITY/DUALITIES 




X 




X 






4 


n03B0J15IET 
ALLOWS+DOES ALLOW 


■ B 












5 


;iOKA3ATb 
TO PROVE 


E 












6 


JlEMMbI 

LEMMA/LEMMAS 




X - 




■ X 






7 


TOJlbKO 

ONLY 












★ 


8 


B 

IN/INTO 










B 




9 


3TOM 

THIS 














10 


CJIYHAE 

CASE 










E 




11 


• 

* 


* 













NOTE: In the translation field, entries may be separated by a 

slash (/), an asterisk (*), or a plus sign (+). A slash is used 
to separate different meanings, an asterisk is used to separate 
the singular and plural forms of a given meaning, and a plus sign is 
used to separate two different forms of a verb, e.g. goes + does 
go. The progressive form ('is going') is not stored in the 
translations. 

In the columns, B indicates "beginning of block," E indicates 
"end of block", X indicates a one word block, and * indicates an 
unblocked Item. 



Internally, PROFILE constructs the following matrix for 
the., unnested nominal blocks in the sentence, where tlie entries 
are the wo.rd numbers of the left and right boundaries of a 
block: 



NB # 


COL I 


COL “ri 


COL III' 


1 


1-2 




1-2 


2 


3-3 




3-3 


3 


6-6 




6-6 


4 









/ 

NOTE: COL II is empty because the predicative block (-noaBOJl^lET 
^OKA3ATb) governs more than one case. 



The predicative block in the sentence governs the accusative 
and dative cases in the following combinations only: 

Pattern 1: A 
Pattern 2: DA 

Henceforth, COL II will be considered to have as many sub- 
columns as there are governed cases in the pattern under consi- 
deration. Hence, we have COL II^ for Pattern 1, and COL 11^^ 
and COL Up for Pattern 2. 

Replacing the word numbers of the boundaries by the symbol 
X to indicate that a nominal block has been entered in a certain 
position of the matrix, and substituting the symbol S. for the 
ith nominal block, and considering only the unnested nominal 
blocks, we have the following two nominal block matrices for 
the sentence and the two patterns: 






iaki»ri 



» r^V ; i i tf< M f i M hM r *» r i H i« n 










iM’MM 



-46- 



MATRIX FOR 
PATTERN 1 


COL r 


■ \ 

COL IIa 


COL III 


^1 


X 


* 


•• X 


$2 


X 


X 


X 


S3 


X 


X 


X 



MATRIX FOR 
PATTERN 1 


COL I 


COL Up 


COL II^ 


COL III 


.^1 


X 






X 




X 


X 


• X 


X 


^3 


X 




V 

X 


X 



where S. is the ith unnested nominal block. 

Since $2 and do not agree with the predicative block 
(for the nominative case they are plural while the predicative 
block is singular), they can be removed from COL I. Also, 
neither $1 nor is preceded by a governing nominal block, nor 
is either in the instrumental case, and therefore each can be 
removed from COL III. None of the removals causes obliteration 
of the S. from the matrix. 



47 -’ 



The matrices after the initial modification are now as 
follows: 



MATRIX FOR 
PATTERN 1 


COL I 


COL II^ 


COL III 


C/) 


X 






1 

CVJ 

oo 




X 


X 


^3 




X 





MATRIX FOR 
PATTERN 1 


COL I 


COL Up 


COL II^ 


COL III 


^1 


X 

■ 








^2 




X 


X . 


1 X 


^3 






1 

X 

1 





-48- 



Since is only 'in COL Ily^, $2 must be removed from that' 
column. This can be' done without obliteration, so th^ matrices 
are' now as fol 1 ows : . . 



MATRIX FOR 
PATTERN 1 


COL I 


COL II^ 


COL III 


• 


^1 


X 




. 




^2 

(T 






X 








X 




- 






> 






MATRIX FOR* 
PATTERN 1 


COL I 


COL Up 


COL 11^ 


COL III 


^1 


X 








^2 




X 




X 


^3 




- 


X . 


> 



T’**' matrix for Pattern 1 is finished, and yields exactly 
one parsing where 

is the subject 
$2 is the adjunct 

is governed by the predicative block 





This is the only "correct" parsing,, but examination of the 
matrix for Pattern 2 -shows that another parsing is formally, 
though not semantical ly .possible. 

In HYPERPARSE, each sub-column of COL II must have at least 

one entr^, or tbfe pattern cannot apply. Therefore, for Pattern 2 

an additional logical operation which leads to the removal of 

$2 from COL III is used. The operation can be described by 

sayiflg that if S.j is the only entry in a given sub-column 

of COL II, then the entries of S. are erased elsewhere in 
the row. 

The final matrix for Pattern 2, 



PATTERN 2 


COL . I 


COL Up 


CDL II^. 


COL III' 


s, 

1 


X 




- 




^2 




X 






^3 






X 





yields exactly one parsing where 

is the subject 

52 is governed by the predicative block ■ 

53 is governed by the predicative block 

The translation of the sentence when it is parsed in this way, 
is: 'This theorem allows (one) to prove the lemmas ^ the duality 

only in this case.' (The English word order had to be adjusted 

to allow 'duality' to function as the indirect object of 'prove'. 
It should be noted that any indirect object in this context 
probably has to be animate, and that sometime in the future, 
this restriction should be coded into the pattern.. This will 

eliminate the semantically unsound parsing here and perhaps in 
many other cases. 



-50- 



Just as HYPERPARSE creates a matrix for the unnested 
nominal blocks which are in the cases specified by the pattern 
being considered, it also creates a matrix of unnested governed 
prepositional blocks. The PREP column-is thesource of an NPREP 
matrix which for each pattern has as many sub-columns as there 
are governed prepositions in the pattern. Each sub-column must have 
at least one- prepositional block entry corresponding to a governed 
preposition d?n,an input pattern, in order for the pattern to apply. 

The governed prepositions Pp P2. and P3 In a given pattern, 
m^iy lead to the following matrix In a hypothetical sentence, 

•where each -entry comprises the boundaries of a prepositional 

block: ' , . 



• Pi , 


P2 


P3 


1- 3 


00 

1 


6-9 


12-14 







This means that for one of the governed prepositions, two 

realizations have been found. The parsings will then be: 

(1- 3)(17-18)(6-9) 

(12-14)(17-18)(6-9). 

When both cases and prepositions are governed in a given pattern, 
each case parsing must be joined with each preposition parsing 
to produce the complete set of parsings for the pattern. 






- 51 - 









t f 



•The following- paragraphs- give a verbal summary of the 
flow charts of HYPERPARSE: 



A... 



B 



D 



Create a subject column, SU.BINT (corresponding to the' 
reduced COL I of PARSE) , . containing only those unnes.ted 
nominal blocks which agree with the predicative bloc.k 
•through' the agreement code and the person bits. 

NOTE: If the pattern being tried allows a subject, SUBINT 

becomes the subject column (SUBCOL).- 
- Create-- two adjunct columns, NORADJ and SPCADJ (each corres- 
ponding to- the reduced COL TII of PARSE),- containing 
only those nominal blocks which. can actually serve as 
adjuncts in the given sentence. NORADJ contains each un- ' 
nested nominal block which is (immediately) preceded by 
a ncm:nal b.lock. goverri-i'ng it, or which is in the instru- 
mental case; SPCADJ, of which NORADJ is a subset , contains , 
in addition, any nominal block in. -the accusative case. 
NOTE:. If the accusative case is governed in the pattern 
being^ tried, NORADJ i-s used as the adjunct column 
(ADJCOL); otherwise SPCADJ, is used. 

After. a pattern is read in, create a prepositional column 
NPREPp, p=l(l)P, P^4, for each preposition governed in the 
pattern. NPREPp contains all unnested prepositional phrases 
in the sentence .such that the prep.osi ti on is a governed one 
and the case of its governed nominal block is the one speci- 
fied for that preposition. 

Create an object column, OBJCOL , c=l(l)C, C^4, for each 



OBJCOL^ contains all unnested nominal 



case in the pattern. 

blocks in the sentences which are in a given governed case. 

E. Check the validity of the matrix created in A, B, and D, i.e. 
ascertain that every unnested nominal block, in tfie s^entence 



has been entered in at least one of SUBCOL, OBJCOL 
or ADJCOL. 



0,....C)’ 



nr 

The flow charts are available to persons having special 
interest in the programming details of this procedure. 



o 

ERIC 



- 52 - 



. F. Reduce the matrix according to the following two schemes: 

a) Check each column for an entry, which is not in any 
other t:olumn* If exactly one such entry is found 

i n a cdluiTin, erase the other entries in the column. 

b) Check each OBJCOL.^^ for having exactly one 

entry. If this condition exists then erase the 
other entries, in the same row. 

G. Fin.d all possible ways of choosing an entry from each of 

SUBCOL and OBJCOL^ • such that no two entries chosen 

are in the same row. 

(N0T€: In some cases, the above choices- are made without 
.using SUBCOL, e.g., when the predicative in the sentence must 
be impersonal, or as an alternative when the matrix is valid 
without SUBCOL, even when a subject is allowed.) 

Then ascertain that all rows havi ng ‘ entries but wbtch. baye not 
been selected, do have an entry in the adjunct column, and 
select those entries. 

Combine the selected entry pattern inG, with the selection 
of the first row (if it is non-empty) of the prepositional 
block matrix, and write out the parsing. 

Paragraphs A-G are merely a simplified summary of how a success 
ful parsing involving governed cases and prepositional blocks is 
made. Alternatives used when certain conditions (assumed or 
ignored in the above) are not met, and the corresponding error 
messages are shown in detailed flow charts. 

The auxiliary dictionary and the improved parsing routine, 
HYPERPARSE, are only a f i rst .app.rpxi’mat i on to a good language 
transfer system. The quality will be improved when the preword(s) 
associated with each governed case and the transl ati on (s ) 
associated with each governed preposition are coded for each 
government pattern, along with the translation of the predicative 
for that given context. This coding will reduce the number of 
meanings to be printed out. 



er|c 










r^irV, j rrfW w I i ' 



■ ■ iriifeif-Kirii 



ONR BASIC RECIPIENTS MAILING LIST 



TECHNICAL LIBRARY 
OIREGTDR DEFENSE RES. ♦ ENG 
ROOM 3C-128* THE PENTAGON 
WASHINGTON* 0. C. 20301 

DEFENSE DOCUMENTATION CENTER 
CAMERON STATION 
ALEXANDRIA VIRGINIA 22314 



CHIEF OF NAVAL RESEARCH 
DEPARTMENT OF. THE NAVY 
WASHINGTON 25* D. C. 

ATTN CODE 437* INFORMATION SYSTEMS BRANCH 



DIRECTOR* NAVAL RESEARCH LABORATORY 
TECHNICAL INFORMATION OFFICER /CODE 2000/ 
WASHINGTON 25, D. C. 









f - 



20 




02 



i- 

I 



06 




COMMANDING OFFICER, OFFICE OF NAVAL RESEARCH 10 
NAVY *100, FLEET POST OFFICE BOX 39 
I NEW YORK, NEW YORK, 09599 

1 ■ ' . ■ ■ 



COMMANDING OFFICER 
I ONR BRANCH OFFICE 

I 207 WEST 24TH STREET 

^ NEW YORK 11, NEW YORK 



I OFFICE OF NAVAL RESEARCH BRANCH OFFICE 

! 495 SUMMER STREET 

BOSTON, MASSACHUSETTS 02110 

t 

i >. 

i NAVAL ORDNANCE LABORATORY 

j WHITE OAKS 

I ' SILVER SPRING 19, MARYLAND 
ATTN TECHNICAL LIBRARY 



DAVID TAYLOR MODEL BASIN 
CODE 042 TECHNICAL LIBRARY 
WASHINGTON 0. C. 20007 




ERIC 







naval electronics laboratory 

SAN 01 EGO 52# CALIFORNIA 
ATTN TECHNICAL LIBRARY 



OR. DANIEL ALPERT, DIRECTOR 
COORDINATED SCIENCE LABORATORY 
UNIVERSITY OF ILLINOIS 
URBANA, ILLINOIS 



AIR FORCE CAMBRIDGE RESEARCH LABS 
LAURENCE. C. HANSCOM FIELD 
BEDFORD# MASSACHUSETTS 
ATTN RESEARCH library# CRMXL-R 



U. S. NAVAL WEAPONS LABORATORY 
DAHLGREN# VIRGINIA 22|!#48 
ATTN G. H. GLEISSNER ,/CODE K-4/ 
ASST DIRECTOR FOR COMPUTATION 



NATIONAL BUREAU OF STANDARDS 
DATA PROCESSING SYSTEMS DIVISION 
ROOM 2.39# BLDG. 10 
ATTN Ai K. SMILOW 
WASH 25# D. C. 



GEORGE C. FRANCIS 

COMPUTING LAB# BRL 

ABERDEEN PROVING GROUND, MARYLAND 



OFFICE OF NAVAL RESEARCH 
BRANCH OFFICE CHICAGO 
230 N. MICHIGAN AVENUE 
CHICAGO# ILLINOIS 60601 



COMMANDING OFFICER 
0 N R BRANCH OFFICE 
1030 E. GREEN STREET 
PASADENA, CALIFORNIA 



COMMANDING OFFICER 
0 N R BRANCH OFFICE 
1000 GEARY STREET 
SAN FRANCISCO 9# CALIFORNIA 



t£LiiMuBUk0l6li '] X fcitffcSt 






^^stfornmn 



iii»mk\i0ti^ 



. -56- 

^ WAYNE STATE UNIVERSITY 
OETROITt MICHIGAN 
ATTN DEPT. OF SLAVIC LANGUAGESf 
PROF. HARRY H. JOSSELSON 40202 



. HEBRJEW UNIVERSITY 
JERUSALEM, ISRAEL 
ATTN PROF. Y. .BAR-HILLEL 



NATIONAL PHYSICAL L.ABORATORY 
TEODINGTON, MIDDLESEX 
ENGLAND 

ATTN DR.. A. M. UTTLEY, SUPERINTENDENT, 

autonomics division 



COMMANDING OFFICER 
HARRY DIAMOND LABORATORIES 
ATTN LIBRARY 
WASHINGTON, D. C. 20438 




COMMANDING OFFICER AND DIRECTOR 
U. S. NAVAL TRAINING DEVICE CENTER 
PORT WASHINGTON 
LONG ISLAND, NEW YORK 
ATTN TECHNICAL LIBRARY 



DEPARTMENT OF THE ARMY 

OFFICE OF THE CHIEF OF RESEARCH ♦ DEVELOPMENT 
PENTAGON, ROOM 3D.442 
WASHINGTON 25,. 0. C. . 

ATTN MR. L. H. . GEIGER 



CAMBRIDGE LANGUAGE RESEARCH UNIT 

20 MILLINGTON ROAD 

CAMBRIDGE, ENGLAND 

ATTN MRS.. MARGARET M. BRAITHWAITE 



NATIONAL SECURITY AGENCY 
FORT GEORGE G. MEADE, MARYLAND 
ATTN LIBRARIAN, C-332 



LINCOLN LABORATORY 

MASSACHUSETTES INSTITUTE OF TECHNOLOGY 
LEXINGTON 73, MASSACHUSETTES 
ATTN LIBRARY 






4 



O * 



ja.taw->M> Y 



:\ 



-r 



^ # 
V 



J 
■ I 



■f 



i 









«Mi<ii«t»t»uiitt,iifi-i^jyi»^ 






-57- 



UR, WILLIAM S^Y WANG 
DIVISION OF. L INGUISTICS 
OHIO STATE UNIVERSITY 
COLUMBUS lA, OHIO 



professor: Ml EKO HAN 
DEPARTMENT FOR ASIAN STUDIES 
UNIVERSITY OF SOUTHERN CALIFORNIA 
UNIVERSITY PARK 

LOS ANGELES, CALIFORNIA 900007 










